1 Introduction

[ThG: I have edited/commented sections 1-4. Overall I think you should (1) provide a simple illustration in the introduction that summarizes all 3 methods (annot, manual, co-clustring). Figure 5 can then be removed since it is hard to follow and contains too much information relating to the supplement which we wanted to remove from this vignette. (2) Further please reduce the number of code chunks to make things easier for the user. For clarity you could use meta functions and summarize in the text what they are doing. Often you include the individual steps but then there is no explanation given what they are doning. (3) The optimization part needs to be entirely removed from the main part of this vignette as there is no context given what its relevance is. (4) Please also simplify the Supplement section 5 or remove it from the vignette and perhaps include it in a script file that is then just referenced for expert users in the text. Once this part 5 has been improved I will fine edit it. (5) In addition, please address my comments given in red font throughout the text.]

1.1 Overview

The primary utility of the spatialHeatmap package is the generation of spatial heatmaps (SHM) for visualizing cell-, tissue- and organ-specific abundance patterns of biological molecules in anatomical images (J. Zhang et al. 2022). This is useful for identifying genes with spatially enriched (SE) expression patterns as well as clusters and/or network modules composed of genes sharing similar expression patterns. These functionalities are described in the main vignette of the spatialHeatmap package.

The following describes extended functionalities for integrating bulk tissue with single cell data by co-visualizing them in composite plots that combine spatial heatmaps with embedding plots of high-dimensional data. The main input data required for this co-visualization are quantitative assay data and anatomic images. The assay data can be provided in a variety of widely used tabular data structures (e.g. data.frame, SummarizedExperiment or SingleCellExperiment), while the anatomic images need to be supplied as annotated SVGs (aSVGs). The creation of aSVGs is described in the main vignette of this package. For the embedding plots of single cell data, several dimensionality reduction algorithms (e.g. PCA, UMAP or tSNE) are supported. To associate single cells with their source tissues, the user can choose among three major methods including annotation, manual and automatic (Figure 1). Like most functionalities in spatialHeatmap, these functionalities are available from within R as well as the corresponding Shiny app (Chang et al. 2021, Chang and Borges Ribeiro (2018)).

1.2 Methods for Associating Cells and Bulk Tissues

[ThG: would simplify and generalize this section to all three methods and move it to the introduction at the beginning. Most of what you are discribing here applies to all methods. By adjusting the illustration you can use the illustration for all 3 methods.]

The three methods for associating single cells with bulk tissues are illustrated in Figure 1. The annotation-based and manual methods are similar. The main difference is how the cell labels are generated. In the annotation-based method, cell annotation labels are usually present in an existing SingleCellExperiment that is created by a research group or the community such as the objects from the scRNAseq package (Risso and Cole 2022). While in the manual method, cell labels are created by custom criteria or methods such as clustering methods and stored in a tabular file. In both methods, expression profiles of every molecule (e.g. gene) under the same cell label are aggregated by taking mean or median. Then the aggregated labels are matched to aSVG spatial features (tissues, organs, etc.) according to a matching list.

By contrast, the automatic method assigns bulk tissues to single cells as labels automatically through co-clustering. The most common inputs are raw count data (e.g. RNA-seq) of chosen bulk tissues and single cells of the same organ (Figure 1.1), and the single cells come from the whole organ or at least cover the chosen bulk tissues. Bulk and single cell data are filtered to remove the most noisy data. The main difference between bulk and single cells is the sparsity in the latter. To reduce such difference, the bulk and single-cell data are combined and normalized together, and subsequently separated (Figure 1.2). Next, dimensionality reduction is performed on single-cell data using PCA or UMAP method, where one resultant dimensionality is equivalent to one molecule (e.g. gene). The top dimensionalities are clustered (Figure 1.4) to generate single cell clusters. Each cell cluster is refined by removing cells having low similarities with other cells in the same cluster (Figure 1.5). As a result, each cell cluster is more homogeneous (Figure 1.6). Next, bulk and single cell data are combined and co-clustered (Figure 1.7) to produce co-clusters.

There are three types of co-clusters: 1) Multiple bulk tissues are co-clustered with many cells. Spearman's correlation coefficient (similarity) is calculated for any bulk-cell pair in the same co-cluster. If a bulk has the largest similarity with a specific cell than any other bulk, this bulk is assigned to this cell as source bulk. For example, in Figure 1.8a bulk A is assigned to cell a1 because a1 has higher similarity with A than B; 2). Only one bulk tissue is co-clustered with many cells. This bulk is assigned to all the cells in the same co-cluster (1.8b); and 3) No bulk is present in a cluster. All these cells are not considered for downstream co-visualization (Figure 1.8c), which are potential novel cell populations. After the automatic bulk assignments to cells, cells with the same bulk label are aggregated in the same way as annotation-based or manual method and subsequently the aggregated assay profiles are mapped to aSVG features.

[ThG: ROCs can only be generated if the correct assingments are known. Readers will not understand why this is mentioned here and how they would use the tool since they don't have the true tissue assignments.] JZ: ROC is related to optimzation. The text is updated.

[ThG: also note the text in the above two paragraph needs major revisions with respect to grammar and clarity.]

Overview of co-visualization methods. In the annotation-based method, annotation labels are provided in existing `SingleCellExperiment`, while in the manual method cell labels are created by users, which is very flexible. In the automatic method, bulk tissues are automatically assigned to single cells as labels in a co-clustering process. In any of these methods, cells under the same label are aggregated by mean or median on assay profiles. Cells are colored by their aggregated expression profiles in embedding plots and aSVG spatial features matching with these cells are painted by the same colors.

Figure 1: Overview of co-visualization methods
In the annotation-based method, annotation labels are provided in existing SingleCellExperiment, while in the manual method cell labels are created by users, which is very flexible. In the automatic method, bulk tissues are automatically assigned to single cells as labels in a co-clustering process. In any of these methods, cells under the same label are aggregated by mean or median on assay profiles. Cells are colored by their aggregated expression profiles in embedding plots and aSVG spatial features matching with these cells are painted by the same colors.

2 Getting Started

2.1 Installation

The spatialHeatmap package can be installed with the BiocManager::install command.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spatialHeatmap")

2.2 Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(scran); library(scater); library(igraph); library(SingleCellExperiment); library(BiocParallel)

The following lists the vignette(s) of this package in an HTML browser. Clicking the name of the corresponding vignette will open it.

browseVignettes('spatialHeatmap')

To reduce runtime, intermediate results can be cached under ~/.cache/shm.

cache.pa <- '~/.cache/shm' # Set path of the cache directory

3 Mapping Cells to Tissues

Three main methods are developed for assigning cell types from single cell experiments with their source tissues, i.e. annotation-based, manual and automatic. Before detailed demonstration on these methods, a general toy example that summarizes the common features of the three methods is shown, which allows users to have a quick test.

3.1 Toy Example

To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(10)

Create a toy single cell data by using mockSCE from the scuttle package (McCarthy et al. 2017). The toy data are normalized and log2-transformed (logNormCounts), and subsequently processed by dimensionality reduction methods of PCA, TSNE, and UMAP (reduce_dim).

sce.toy <- mockSCE() # Create toy data.
sce.norm.toy <- logNormCounts(sce.toy) # Normalization and log2-transformation.
sce.dimred.toy <- reduce_dim(sce.norm.toy) # Dimensionality reduction.  

The downstream co-visualization is demenstrated on the label of cell cycles (Cell_Cycle), which includes G1, G2M, G0, S.

colData(sce.dimred.toy)[1:2, ]
## DataFrame with 2 rows and 4 columns
##          Mutation_Status  Cell_Cycle   Treatment sizeFactor
##              <character> <character> <character>  <numeric>
## Cell_001        positive          G1      treat2   1.029846
## Cell_002        positive          G1      treat2   0.952431
unique(colData(sce.dimred.toy)$Cell_Cycle)
## [1] "G1"  "S"   "G0"  "G2M"

As shown in (Figure 1), to map single cell data onto aSVG features, assay profiles under the same cell label should be aggregated by mean or median (aggr='mean'). The aggregation is performed on log2-transformed normlized data as specified by assay.na='logcounts'. After aggregation, all cells are collapsed into 4 cell cycle labels.

sce.aggr.toy <- aggr_rep(sce.dimred.toy, assay.na='logcounts', sam.factor='Cell_Cycle', con.factor=NULL, aggr='mean')
colData(sce.aggr.toy)
## DataFrame with 4 rows and 4 columns
##     Mutation_Status  Cell_Cycle   Treatment sizeFactor
##         <character> <character> <character>  <numeric>
## G1         positive          G1      treat2   1.029846
## S          positive           S      treat1   0.963111
## G0         positive          G0      treat2   1.000982
## G2M        negative         G2M      treat1   0.992205

An example aSVG file of mouse brain is included in spatialHeatmap, and spatial features are extracted.

svg.mus.brain <- system.file("extdata/shinyApp/example", "mus_musculus.brain.svg", package="spatialHeatmap")
feature.df <- return_feature(svg.path=svg.mus.brain)
tail(feature.df$feature) # Partial features are shown.
## [1] "brainstem"                    "midbrain"                    
## [3] "dorsal.plus.ventral.thalamus" "hypothalamus"                
## [5] "nose"                         "corpora.quadrigemina"

One collapsed cell cycle is equivalent to one aSVG feature. Before co-visualization, their matching relationship needs to be defined. In annotation-based and manual methods, the matching relationship is defined by users in a named list, while in the automatic method it is automatically defined, which is the essential difference across the three methods. Note, one collapsed cell label can be matched to multiple aSVG features but not vice versa. The following is an example named list.

lis.match.toy <- list(G1=c('hypothalamus'), G0=c('brainstem', 'medulla.oblongata'))

The co-visualization is created on gene Gene_0010. The embedding plot includes all single cells before aggregation. The G1 and G0 cells are colored by their respecitive aggregated expression profiles (data=sce.aggr.toy) and indicated in the legend at the bottom. In the nearby spatial heatmap plot, aSVG features are filled by the same color as the matching cells defined in the named list (lis.match.toy). The cell.group argument indicates which column is considered as cell labels in the colData slot of sce.aggr.toy.

shm.lis.toy <- spatial_hm(svg.path=svg.mus.brain, data=sce.aggr.toy, ID=c('Gene_0010'), height=0.7, legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=3, sce.dimred=sce.dimred.toy, dimred='PCA', cell.group='Cell_Cycle', assay.na='logcounts', tar.cell=c('matched'), lis.rematch=lis.match.toy, bar.width=0.11, dim.lgd.nrow=1)
Example co-visualization plot. The co-visualization is created on gene `Gene_0010`. Single cells in the embedding plot and their matching aSVG features in the spatial heatmap are filled by the same color according to aggregated assay profiles within each cell label.

Figure 2: Example co-visualization plot
The co-visualization is created on gene Gene_0010. Single cells in the embedding plot and their matching aSVG features in the spatial heatmap are filled by the same color according to aggregated assay profiles within each cell label.

3.2 Annotation-based

The annotation method mainly takes advantage of annotation labels in existing SCE objects created by the community, such as the SCE objects from the R package scRNAseq (Risso and Cole 2022).

[ThG: the Shiny part makes no sense in this annotation section since mouse actions would fall under manual mode. There must be an upload option in the app.]

The following example uses single cell data from oligodendrocytes of mouse brain (Marques et al. 2016), which is downloaded in scRNAseq (Risso and Cole 2022) with minor changes. Prior to co-visualizing, the single cell data are pre-processed with standard QC and normalization methods. The details of these steps are described in the OSCA tutorial.

set.seed(10)

The single cell sample data set can be imported as follows.

sce.pa <- system.file("extdata/shinyApp/example", "sce_manual_mouse.rds", package="spatialHeatmap")
sce <- readRDS(sce.pa)

The quatity control (QC), normalization, and dimensionality reduction steps can each be performed with a single command (process_cell_meta).

[ThG: you need to at least briefly state what algorithms/method are used in each step.]

[Short version]

In QC, common per-cell metrics are cacluated such as library size, mitochodrial gene percentage, etc. Then problematic cells are filtered out according to these metrics. Refer to perCellQCMetrics and perCellQCFilters in the scuttle package for more details (McCarthy et al. 2017). In normalization, per-cell size factors are computed using a scaling normalization method followed by a deconvolution strategy. Single cells are finally normalized by these per-cell size factors. See more details in quickCluster, computeSumFactors from the scran package (Lun, McCarthy, and Marioni 2016), and logNormCounts from the scuttle package (McCarthy et al. 2017).

[Long version]

In the QC, frequently used per-cell metrics are calculated for identifying problematic cells, such as library size, number of detected features above a threshold, mitochodrial gene percentage, etc. Then these metrics are used to determine outlier cells based on median-absolute-deviation (MAD). Refer to perCellQCMetrics and perCellQCFilters in the scuttle package for more details (McCarthy et al. 2017). In the normalization, a quick-clustering method is applied to divide cells into clusters. Then a scaling normalization method is performed to obtain per-cluster size factors. Next, the size factor in each cluster is decomposed into per-cell size factors by a deconvolution strategy. Finally, all cells are normalized by per-cell size factors. See more details in quickCluster, computeSumFactors from the scran package (Lun, McCarthy, and Marioni 2016), and logNormCounts from the scuttle package (McCarthy et al. 2017).

In dimensionality reduction, the high-dimensional gene expression data are embedded into a 2-3 dimensional space using PCA, tSNE and UMAP. All three embedding result sets are stored in the SingleCellExperiment object. Details are seen in denoisePCA from scran (Lun, McCarthy, and Marioni 2016), and runUMAP, runTSNE from scater (McCarthy et al. 2017). Subsequently, the UMAP result is visualized as an example in form of a scatter plot where the dots are colored by the corresponding cell labels.

sce.dimred <- process_cell_meta(sce, qc.metric=list(subsets=list(Mt=rowData(sce)$featureType=='mito'), threshold=1))

plotUMAP(sce.dimred, colour_by="label")
Embedding plot single cells. The cells are colored by labels in the `label` column in `colData`.

Figure 3: Embedding plot single cells
The cells are colored by labels in the label column in colData.

The following command returns a slice of cell metadata in the colData slot of SCE. In principle, every column can be utilized as annotation labels for the purpose of co-visualization. In this example, the label and expVar columns are taken as annotation labels and experiment treatment variables respectively.

colData(sce.dimred)[25:27, ]
## DataFrame with 3 rows and 7 columns
##                              label         age
##                        <character> <character>
## C1.1772125.078.A03 corpus.callosum         p21
## C1.1772096.168.C12    hypothalamus         p28
## C1.1772117.102.H11     dorsal.horn         p21
##                    inferred.cell.type         Sex      strain
##                           <character> <character> <character>
## C1.1772125.078.A03              MFOL1           M  PDGFRa-Cre
## C1.1772096.168.C12              MFOL1           M      C57BL6
## C1.1772117.102.H11              MFOL1           ?      C57BL6
##                            expVar sizeFactor
##                       <character>  <numeric>
## C1.1772125.078.A03        control   1.077659
## C1.1772096.168.C12 6h.post.stress   0.758306
## C1.1772117.102.H11        control   1.070574

[ThG: First, the previous is an incomplete sentence that should only be used in a title. Second, this section is confusing. What do you mean by 'aggregate'? Isn't this done already by having the annotations available? Aren't you using the existing annotations to compute a summary statistics for each predefined group of cells belonging to a tissue specified by the annotations? Subsequently, the summary values are used to define the colors in the spatial heatmap, correct? If so try to outline this more clearly. The same applies ot the help file of the aggr_rep function. The text in this help file is hard to follow.

Each annotation label in the label column of colData defines a cell group or cluster (sam.factor='label'). The expression profiles of each gene are aggregated by mean (aggr='mean') within each cell group. If experiment variables are provided, here con.factor='expVar', the aggregation will be performed on the new variable as a combination of label and expVar. The aggregation is carried out on normalized data indicated by assay.na='logcounts'.

sce.aggr <- aggr_rep(sce.dimred, assay.na='logcounts', sam.factor='label', con.factor='expVar', aggr='mean')

A slice of aggregated data is shown. hypothalamus and control is an annotation label and experiment treatment respectively. The abundance mean of gene Tcea1 under the combined variable hypothalamus__control is 1.440910.

as.matrix(logcounts(sce.aggr)[1:2, 1:2])
##         hypothalamus__control SN.VTA__control
## Tcea1                1.440910         1.62474
## Atp6v1h              2.275958         1.95121

Next, the spatial features are extracted with the return_feature function from an aSVG image of mouse brain. This sample aSVG image file is provided by the package.

svg.mus.brain <- system.file("extdata/shinyApp/example", "mus_musculus.brain.svg", package="spatialHeatmap")
feature.df <- return_feature(svg.path=svg.mus.brain)
tail(feature.df$feature)
## [1] "brainstem"                    "midbrain"                    
## [3] "dorsal.plus.ventral.thalamus" "hypothalamus"                
## [5] "nose"                         "corpora.quadrigemina"

[ThG: Why is do you select here only one feature/tissue instead of all annotated features. Also the need/utility of this list is unclear.]

In practice, the identifiers of aSVG features and annotation label counterparts are not exactly the same, since they might come from different sources, such as corpus.striatum in aSVG and corpus.callosum in label. To coordinate the differences and give users flexibilities, a named list is required to match cell labels and aSVG features, where cell labels are in the name slots and aSVG features are list elements. The example list is created on three tissues that have same or similar countterparts between aSVG features and cell labels. Note, one cell label could be matched to multiple aSVG features, not vice versa.

lis.match <- list(hypothalamus=c('hypothalamus'), cortex.S1=c('cerebral.cortex'), corpus.callosum=c('corpus.striatum'))

[ThG: there needs to be a detailed explanation or legend explaining what is shown on the plot. Again coloring only one tissue creates the impression that tissues need to be depicted one by one instead of a single plot. I would change this example to highlight most tissues. Most genes are not so selectively expressed to show up in only one tissue, so choosing here an example where this is the case shouldn't be hard.

Co-visualization is created on aggregated expression profiles (sce.aggr). Here gene Cops5 is chosen as an example. The data includes two experiment variables control and 6h.post.stress, thus covisualization plots are created under each condition. The legend under embedding plots shows the cell labels present in the matching list (lis.match). The cell population of the same label under the same condition are colored by their aggregated expression profiles, and the bulk tissue counterpart is colored by the same color in the anatomical image. In the downloaded single cell data, only hypothalamus is present under 6h.post.stress, so only hypothalamus cells and bulk tissues are colored under 6h.post.stress. The tar.cell argument specifies which cell populations to show. It can be one or multiple cell labels, and matched indicates all cell labels in the matching list.

shm.lis <- spatial_hm(svg.path=svg.mus.brain, data=sce.aggr, ID=c('Cops5'), height=0.7, legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=2, sce.dimred=sce.dimred, dimred='PCA', cell.group='label', assay.na='logcounts', tar.cell=c('matched'), lis.rematch=lis.match, bar.width=0.09, dim.lgd.nrow=2)
Co-visualizing single cells and bulk tissues by annotation cell labels. The expression profiles of gene `Cops5` are used. Cell population of the same annotation label is colored by their aggregated expression profiles under the same condition, and the bulk tissue counterpart is colored by the same color.

Figure 4: Co-visualizing single cells and bulk tissues by annotation cell labels
The expression profiles of gene Cops5 are used. Cell population of the same annotation label is colored by their aggregated expression profiles under the same condition, and the bulk tissue counterpart is colored by the same color.

3.3 Manual

In the manual method, cell group or cluster labels are manually created according to criteria or methods chosen by users such as clustering methods, which is very flexible than the annotation method. The resultant manual clusters need to be stored in a tabular file, then added to the SingleCellExperiment by the manual_cluster function. The same single cell data and aSVG file as in the annotation method are used to demonstrate the manual method. The steps of quality control, normalization, and dimensionality reduction are the same with the annotation method, while a subsequent step of adding manual clusters to the SingleCellExperiment is required.

[ThG: I don't understand why the manual method uses clustering??? If clustering needs to be used here then it is unclear what clustering algorithim is applied.]

An example manual cluster file is included in spatialHeatmap, where cluster labels are created by the clustering function cluster_cell. Refer to the help file for how to use this function. Two columns are required in this file. The cell column contains single cell identifiers present in the rownames of colData slot in the SingleCellExperiment, while the cluster column contains the manual cluster labels.

manual.clus.mus.sc.pa <- system.file("extdata/shinyApp/example", "manual_cluster_mouse_brain.txt", package="spatialHeatmap")
manual.clus.mus.sc <- read.table(manual.clus.mus.sc.pa, header=TRUE, sep='\t')
manual.clus.mus.sc[1:3, ]
##                 cell cluster
## 1 C1.1772078.029.F11   clus6
## 2 C1.1772089.202.E04   clus6
## 3 C1.1772099.091.D10   clus3

Manual cluster labels are exclusively included in the cluster column in the colData slot by the function manual_cluster, which does not interfere with existing columns.

sce.clus <- manual_cluster(sce=sce.dimred, df.clus=manual.clus.mus.sc)
colData(sce.clus)[1:3, ]
## DataFrame with 3 rows and 8 columns
##                        cluster        label         age
##                    <character>  <character> <character>
## C1.1772078.029.F11       clus6 hypothalamus         p22
## C1.1772089.202.E04       clus6       SN.VTA         p22
## C1.1772099.091.D10       clus3  dorsal.horn         p20
##                    inferred.cell.type                    Sex
##                           <character>            <character>
## C1.1772078.029.F11              NFOL1                      F
## C1.1772089.202.E04              NFOL2 pooled male and female
## C1.1772099.091.D10              NFOL2                      ?
##                         strain      expVar sizeFactor
##                    <character> <character>  <numeric>
## C1.1772078.029.F11      C57BL6     control    1.22941
## C1.1772089.202.E04         CD1     control    1.16755
## C1.1772099.091.D10      C57BL6     control    1.08944

Embedding plot of single cells colored by manual clusters in the cluster column in colData.

plotUMAP(sce.clus, colour_by="cluster")
Embedding plot single cells. The cells are labeled by clusters in the `cluster` column in `colData`.

Figure 5: Embedding plot single cells
The cells are labeled by clusters in the cluster column in colData.

The same mouse brain aSVG as the annotation method is used.

tail(feature.df$feature)
## [1] "brainstem"                    "midbrain"                    
## [3] "dorsal.plus.ventral.thalamus" "hypothalamus"                
## [5] "nose"                         "corpora.quadrigemina"

Similar with the annotation method, create a named list to match manual cell cluster clus1 with aSVG feature hypothalamus, and cluster clus3 with cerebral.cortex and midbrain. The latter demonstrates one cell cluster is matched to multiple aSVG features.

lis.match.clus <- list('clus1'=c('cerebral.cortex'), 'clus3'=c('hypothalamus', 'midbrain'))

[ThG: same problem as before the aggregate part is unclear.]

Aggregate cells by taking average (aggr='mean') of gene expression profiles within each manual cluster in the cluster column (sam.factor='cluster'). To illustrate aggregation only on cells, the experiment variable is not considered (con.factor=NULL). assay.na='logcounts' indicates the aggregation is based on normalized data.

sce.clus.aggr <- aggr_rep(sce.clus, assay.na='logcounts', sam.factor='cluster', con.factor=NULL, aggr='mean')

Co-visualization plots is built on aggretated data (sce.clus.aggr). Here gene Rpl7 is chosen. The manual clusters in the matching list (lis.match.clus) are colored by within-cluster aggregation of Rpl7 abundance profiles, and indicated by the legends under the embedding plots. The same color of each matched cluster is used to paint the bulk tissue counterpart in the anatomical image.

shm.lis <- spatial_hm(svg.path=svg.mus.brain, data=sce.clus.aggr, ID=c('Rpl7'), height=0.7, legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=3, sce.dimred=sce.clus, dimred='PCA', cell.group='cluster', assay.na='logcounts', tar.cell=c('matched'), lis.rematch=lis.match.clus, bar.width=0.09, dim.lgd.nrow=1)
Co-visualizing single cells and bulk tissues by manual cell clusters. The expression profiles of gene `Rpl7` are used. The matched manual clusters are colored by within-cluster aggregation of expression profiles and the same colors are used to color the bulk tissue counterparts in the anatomical image.

Figure 6: Co-visualizing single cells and bulk tissues by manual cell clusters
The expression profiles of gene Rpl7 are used. The matched manual clusters are colored by within-cluster aggregation of expression profiles and the same colors are used to color the bulk tissue counterparts in the anatomical image.

[ThG: I don't understand why this section is called 'Manual'?]

3.4 Automatic

In addition to the annatation and manual methods, an automatic method is provided that matches cells and bulk tissues for co-visualization (Figure 1). The automatic process is carried out by combining and co-clustering bulk and single cell data. The most desirable bulk and single cell data would be derived from the same organ or tissue. [ThG: why is that a requirement? You need to better explain what the requirements are here.] If same organ, the chosen bulk tissues are major or complete tissues comprising this organ, while the single cell data are assayed on the region covering at least the chosen bulk tissues or on the whole organ. Similarly, if chosen bulk tissues are sub-tissues composing partial or the complete tissue, the single cells need to cover at least the chosen sub-tissues or the whole tissue. The key point is bulk tissues and single cells have anatomical overlap.

To obtain optimal default settings for the automatic method, this method is optimized and tested on real bulk and single cell datasets. The optimization utilities are included in spatialHeatmap, so users have the choice to optimize this method on their own real data. This section showcases application of the automatic method on mouse brain data with default settings obtained through optimization. The optimization on toy data and real data are explained in an external file [JZ: update in the optimzation section is in progress.] . The bulk RNA-seq data are generated in a research on the impact of placental endocrine on mouse cerebellar development (Vacher et al. 2021) and the scRNA-seq data are from a study of mouse brain molecular atlas (Ortiz et al. 2020). Both bulk and single cell data sets are reduced for demonstration purpose.

3.4.1 Pre-processing

To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(10)

The example bulk and single cell count data are included in spatialHeatmap and imported. Replicates of the same bulk tissue should have the same identifiers. For example, all replicates of Cerebral Cortex have the label CERE.CORTEX instead of CERE.CORTEX1, CERE.CORTEX2, and so on. Since cells are aggregated based on uniuqe bulk tissue labels in downstream procedures.

# Example bulk data.
blk.mus.pa <- system.file("extdata/shinyApp/example", "bulk_mouse_cocluster.txt", package="spatialHeatmap") 
blk.mus <- as.matrix(read.table(blk.mus.pa, header=TRUE, row.names=1, sep='\t', check.names=FALSE))
blk.mus[1:3, 1:5]
##          CERE.CORTEX HIPP HYPOTHA CERE CERE.CORTEX
## AI593442         177  256      50   24         285
## Actr3b           513 1465     228  244         666
## Adcy1            701 1243      57 1910         836
# Example single cell data.
sc.mus.pa <- system.file("extdata/shinyApp/example", "cell_mouse_cocluster.txt", package="spatialHeatmap") 
sc.mus <- as.matrix(read.table(sc.mus.pa, header=TRUE, row.names=1, sep='\t', check.names=FALSE))
sc.mus[1:3, 1:5]
##          isocort isocort isocort isocort olfa
## AI593442       2       2       2       5    0
## Actr3b         3       5       4       4    1
## Adcy1          3       6       6       6    0

Both bulk and single cell data are filtered to remove low quality values. In bulk data, genes are filtered with two filters. The first one is the proportion of counts over a threshold (A) in a gene is at least p (pOA), and the second is the coefficient of variance (CV) is within a specified range. Only genes passing these criteria are retained. Refer to filterfun in the geneflter package for more details (Gentleman et al. 2018). In single cell data, genes and cells are filtered according to proportion of counts over a minimum (min.cnt) is larger than a threshold (p.in.gen for genes and p.in.cell for cells). For instance, min.cnt=1 and p.in.gen=0.1 suggests if the proportion of counts over 1 is at least 0.1 in a gene, this gene is retained.

blk.mus <- filter_data(data=blk.mus, sam.factor=NULL, con.factor=NULL, pOA=c(0.1, 5), CV=c(0.2, 100), verbose=FALSE) 
mus.lis <- filter_cell(lis=list(sc.mus=sc.mus), bulk=blk.mus, gen.rm=NULL, min.cnt=1, p.in.cell=0.5, p.in.gen=0.1, verbose=FALSE) 

The main different between bulk and single cell data is the sparsity in the latter. To reduce such difference, they are combined and normalized together, then separated.

mus.lis.nor <- read_cache(cache.pa, 'mus.lis.nor') 
if (is.null(mus.lis.nor)) { 
  mus.lis.nor <- norm_multi(dat.lis=mus.lis, cpm=FALSE)
  save_cache(dir=cache.pa, overwrite=TRUE, mus.lis.nor)
}

The same aSVG file of mouse brain as the toy example is used. The process of extracting aSVG features is not shown.

tail(feature.df$feature) # Partial features are shown.
## [1] "brainstem"                    "midbrain"                    
## [3] "dorsal.plus.ventral.thalamus" "hypothalamus"                
## [5] "nose"                         "corpora.quadrigemina"

The matching between bulk and single cells are automatic, while the matching between aSVG features and bulk tissues needs to be defined by users in a data.frame. Since there is no uniform naming scheme for aSVG features or bulk tissues. In most cases, identifiers of aSVG features and bulk tissues are user dependent, especially when they are from different research groups. The matching data.frame is utilized to coordinate differences of naming schemes between aSVG features and bulk tissues. In addition, it gives flexibility for users to choose desired identifiers.

The matching table for this example is included in spatialHeatmap and imported. The SVGBulk and dataBulk columns are aSVG features and bulk tissues respectively. Note, these two columns names are fixed so that the algorithm is able to recognize them.

[ThG: again users won't have this. The optimization belongs into the supplement.] [JZ: It is related to optimization.]

match.mus.brain.pa <- system.file("extdata/shinyApp/example", "match_mouse_brain_cocluster.txt", package="spatialHeatmap")
df.match.mus.brain <- read.table(match.mus.brain.pa, header=TRUE, row.names=1, sep='\t')
df.match.mus.brain
##           SVGBulk    dataBulk
## 1      cerebellum        CERE
## 2     hippocampus        HIPP
## 3 cerebral.cortex CERE.CORTEX
## 4     hippocampus        HIPP
## 5    hypothalamus     HYPOTHA

3.4.2 Co-clustering

The processes of clustering single cells (Figure 1.4), refining cell clusters (Figure 1.5), and co-clustering bulk and single cells (Figure 1.7-8) are performed in a single function call. Setting return.all=TRUE returns results in a list (res.lis).

res.lis <- read_cache(cache.pa, 'res.lis')
if (is.null(res.lis)) {
  res.lis <- coclus_meta(bulk=mus.lis.nor$bulk, cell=mus.lis.nor$sc.mus, df.match=df.match.mus.brain, return.all=TRUE, multi.core.par=MulticoreParam(workers=1, RNGseed=50), verbose=FALSE)
  res.lis <- res.lis[[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, res.lis)
}

The source bulk tissue assignments are stored in the colData slot of sce.asg in the result list (res.lis), which are partialy shown below. The assignedBulk indicates assigned source bulk tissues for each cell, and the SVGBulk contains corresponding aSVG features, which is based on the matching table df.match.mus.brain. The predictor includes similarities (Spearman's correlation coefficient) between single cells and assigned bulk tissues.

colData(res.lis$sce.asg)[1:2, c('cell', 'assignedBulk', 'predictor', 'SVGBulk')]
## DataFrame with 2 rows and 4 columns
##                cell assignedBulk   predictor         SVGBulk
##         <character>  <character> <character>     <character>
## isocort     isocort  CERE.CORTEX        0.33 cerebral.cortex
## isocort     isocort  CERE.CORTEX       0.467 cerebral.cortex

The co-cluster information is stored in the colData slot of sce.bulk.cell in the result list (res.lis). The cluster and cell columns indicate co-cluster labels and bulk/cell identifiers respectively.

colData(res.lis$sce.bulk.cell)[1:3, ]
## DataFrame with 3 rows and 2 columns
##                 cluster        cell
##             <character> <character>
## CERE.CORTEX      clus11 CERE.CORTEX
## HIPP             clus20        HIPP
## HYPOTHA           clus7     HYPOTHA

Each co-cluster consisting of bulk and cells can be visualized in an embedding plot. The following plot is the visualization of co-cluster clus11, where other bulk and cells are in gray.

plot_dim(res.lis$sce.bulk.cell, dim='PCA', color.by='cluster', group.sel='clus11')
Embedding plot of co-clusters. Dots represent bulk tissues or single cells. The bulk and cells in the target co-cluster are in blue while all other bulk and cells are in gray.

Figure 7: Embedding plot of co-clusters
Dots represent bulk tissues or single cells. The bulk and cells in the target co-cluster are in blue while all other bulk and cells are in gray.

[ThG: users would expect to see a plot next, yet a new subsection starts now that lacks context. Is this an oversight?]

3.4.3 Tailoring Co-clustering Results

The automatic (co-clustering) method is optimized and tested on a small number of real bulk and single cell datasets, due to the limited availability of single cell datasets where single cells are assayed on a whole organ or whole large tissue and each single cell is well annotated. Thus, the accuracy of automatic source bulk tissue assignments might be also limited. As a remedy, utilities are developed for tailoring the automatic bulk tissue assignments at user preferences. Note, the tailoring step is optional, if users are satisfied with the bulk assignments, just proceed to the co-visualization step.

The tailoring can be performed in command line or on a Shiny app. This section illustrates the command-line based tailoring. First visualize single cells after single-cell cluster refining (Figure 1.5) in an embedding plot as shown below. The x-y axis breaks (x.break, y.break) can be tuned so as to define accurate coordinates in the next step.

plot_dim(res.lis$cell.refined, dim='PCA', color.by='cell', x.break=seq(-10, 10, 2), y.break=seq(-10, 10, 2))
PCA embedding plot of mouse brain single cell data. Single cells right after cluster refining are plotted.

Figure 8: PCA embedding plot of mouse brain single cell data
Single cells right after cluster refining are plotted.

Define desired bulk tissues (desiredSVGBulk) for cells selected by x-y coordinate ranges (x.min, x.max, y.min, y.max) in the embedding plot in form of a data.frame (df.desired.bulk). The dimred reveals where the coordinates come from and are required. In this example, cerebral.cortex is the desired bulk tissue for cells in the two selected regions.

df.desired.bulk <- data.frame(x.min=c(-6, -2), x.max=c(-4, 0), y.min=c(6, 4), y.max=c(8, 6), desiredSVGBulk=c('cerebral.cortex', 'cerebral.cortex'), dimred='PCA')
df.desired.bulk
##   x.min x.max y.min y.max  desiredSVGBulk dimred
## 1    -6    -4     6     8 cerebral.cortex    PCA
## 2    -2     0     4     6 cerebral.cortex    PCA

Incorporate desired bulk assignments to co-clustering results by calling refine_asg. The predictors corresponding to desired bulk are internally set at the maximum of 1. The thr argument is a predictor threshold and used to filter bulk assignments.

[ThG: lack of context; hard to follow; does this need to be here?]

res.lis <- refine_asg(res.lis=res.lis, thr=0, df.desired.bulk=df.desired.bulk, df.match=df.match.mus.brain)

3.4.4 Co-visualization

Similar with the annotation-based and manual methods, cells under the same SVG feature (bulk tissue) are aggregated by averaging (aggr='mean') normalized assay profiles (assay.na='logcounts'). The aggregated assay profiles are used to color corresponding aSVG features.

[ThG: I am unable follow. Please provide clearer explantions to help readers what is done here or remove this part. The fact that you are not showing any output or visualization in this and other code sections makes it very hard to understand what is happening throughout.]

sce.aggr <- aggr_rep(data=res.lis$sce.asg, assay.na='logcounts', sam.factor='SVGBulk', con.factor=NULL, aggr='mean')

The co-visualization of bulk and single cells is built on aggregated gene abundance profiles (profile=TRUE) of gene Adcy1. Two target aSVG features (bulk tissues) are selected, i.e. cerebellum and cerebral.cortex. In the embedding plot, cells matching the same target aSVG feature in the anatomical image are colored according to their aggregated assay profiles, and the same color is used to fill their matching aSVG features. In the embedding plot, the two small colored clusters on the top are the cells defined in df.desired.bulk and are also counted when aggregating expression profiles.

shm.lis1 <- spatial_hm(svg.path=svg.mus.brain, data=sce.aggr, ID=c('Adcy1'), legend.nrow=4, sce.dimred=res.lis$cell.refined, dimred='PCA', assay.na='logcounts', tar.bulk=c('cerebellum', 'cerebral.cortex'), profile=TRUE, dim.lgd.text.size=10, dim.lgd.nrow=1, bar.width=0.1)
Co-visualizing bulk and single cells of mouse brain with abundance profiles. The aggregated expression profile of gene `Adcy1` in cells matching the same bulk is used to fill these cells and bulk tissues.

Figure 9: Co-visualizing bulk and single cells of mouse brain with abundance profiles
The aggregated expression profile of gene Adcy1 in cells matching the same bulk is used to fill these cells and bulk tissues.

Bulk tissues and single cells are co-visualized without abundance profiles (profile=FALSE). The aSVG features (bulk tissues) of interest are filled by different colors in the anatomical image, while the matching cells are indicated in the embedding plot with the same color as their source bulk tissues. At the top of embedding plot, the two small light green clusters are the selected cells in the tailoring section.

shm.lis2 <- spatial_hm(svg.path=svg.mus.brain, data=sce.aggr, legend.nrow=4, sce.dimred=res.lis$cell.refined, dimred='PCA', tar.bulk=c('cerebellum', 'cerebral.cortex'), profile=FALSE, dim.lgd.text.size=10, dim.lgd.nrow=1)
Co-visualizing bulk and single cells of mouse brain without abundance profiles. The matching between cells and source bulk tissues (aSVG features) is denoted by the same color between the embedding plot and anatomical image.

Figure 10: Co-visualizing bulk and single cells of mouse brain without abundance profiles
The matching between cells and source bulk tissues (aSVG features) is denoted by the same color between the embedding plot and anatomical image.

4 Using Shiny App

[ThG: this section would need to show a screenshot showing where this is used in the shiny app.]

The co-visualization feature is included in the integrated Shiny app that is an GUI implementation of spatialHeatmap, including annotation-based, maual, and automatic methods. To start this app, simply call shiny_shm() in R. Below is a screenshot of the co-visulization output generated by the automatic method.

Screenshot of the co-visualization output in Shiny app. The co-visualization plot is generated by the automatic method.

Figure 11: Screenshot of the co-visualization output in Shiny app
The co-visualization plot is generated by the automatic method.

When using the Shiny app, single cell data in annotation-based and manual methods or combined single cell and bulk data in automatic method are stored in a SingleCellExperiment object and saved in an .rds file by saveRDS, then the .rds file should be uploaded to the app. In the manual method, the manual clusters should be included in the cluster column in the colData slot by calling manual_cluster before creating an .rds file, as shown here. In the automatic method, bulk tissues and single cells are labeled by bulk and cell respectively in the bulkCell column in colData slot. The matching table between bulk tissues and aSVG features is stored in the metadata list with the name df.match. The example .rds file below illustrates these rules.

sce.auto <- readRDS(system.file("extdata/shinyApp/example", 'sce_auto_bulk_cell_mouse_brain.rds', package="spatialHeatmap"))
colData(sce.auto)
## DataFrame with 2219 rows and 1 column
##                bulkCell
##             <character>
## CERE.CORTEX        bulk
## HIPP               bulk
## HYPOTHA            bulk
## CERE               bulk
## CERE.CORTEX        bulk
## ...                 ...
## midbrain           cell
## midbrain           cell
## midbrain           cell
## hindbrain          cell
## midbrain           cell
metadata(sce.auto)$df.match
##           SVGBulk    dataBulk
## 1      cerebellum        CERE
## 2     hippocampus        HIPP
## 3 cerebral.cortex CERE.CORTEX
## 4     hippocampus        HIPP
## 5    hypothalamus     HYPOTHA

5 Supplementary Section

5.1 Assigning Desired Bulk on Shiny App

[ThG: I don't understand the above paragraph. The shiny part is totally out of context.]

This section describes tailoring co-clustering results on the convenience Shiny app, which is lauched by calling desired_bulk_shiny.

Figure 12 is the screenshot of the Shiny app. The file to upload is an .rds file of a SingleCellExperiment object saved by saveRDS. An example of how to generate such a file is seen in the help file of desired_bulk_shiny. On the left embedding plot, cells are selected with the "Lasso Select" tool. On the right, selected cells and their coordinates are listed in a table, and the desired bulk tissues (aSVG features) can be selected from the dropdown list, here cerebral.cortex. To download the table just click the "Download" button. The "Help" button gives more instructions.

Screenshot of the Shiny app for selecting desired bulk tissues. On the left is the embedding plot of single cells, where target cells are selected with the "Lasso Select" tool. On the right, desired bulk tissues are assigned for selected cell.

Figure 12: Screenshot of the Shiny app for selecting desired bulk tissues
On the left is the embedding plot of single cells, where target cells are selected with the "Lasso Select" tool. On the right, desired bulk tissues are assigned for selected cell.

An example of desired bulk downloaded from the convenience Shiny app is shown below. The x-y coordinates refer to single cells in embbeding plots (dimred). The df.desired.bulk is ready to use in the tailoring section.

desired.blk.pa <- system.file("extdata/shinyApp/example", "selected_cells_with_desired_bulk.txt", package="spatialHeatmap")
df.desired.bulk <- read.table(desired.blk.pa, header=TRUE, row.names=1, sep='\t')
df.desired.bulk[1:3, ]
##          x        y key desiredSVGBulk dimred
## 1 4.389422 6.917510  62     cerebellum    PCA
## 2 4.586877 8.077207  75     cerebellum    PCA
## 3 6.366188 7.695782  76     cerebellum    PCA


6 Version Informaion

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets 
## [7] methods   base     
## 
## other attached packages:
##  [1] scRNAseq_2.10.0             knitr_1.39                 
##  [3] BiocStyle_2.24.0            BiocParallel_1.30.0        
##  [5] igraph_1.3.1                scater_1.24.0              
##  [7] ggplot2_3.3.6               scran_1.24.0               
##  [9] scuttle_1.6.0               SingleCellExperiment_1.18.0
## [11] SummarizedExperiment_1.26.1 Biobase_2.56.0             
## [13] GenomicRanges_1.48.0        GenomeInfoDb_1.32.1        
## [15] IRanges_2.30.0              S4Vectors_0.34.0           
## [17] BiocGenerics_0.42.0         MatrixGenerics_1.8.0       
## [19] matrixStats_0.62.0          spatialHeatmap_2.1.3       
## [21] nvimcom_0.9-25             
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.3                rtracklayer_1.56.0           
##   [3] tidyr_1.2.0                   bit64_4.0.5                  
##   [5] irlba_2.3.5                   DelayedArray_0.22.0          
##   [7] data.table_1.14.2             rpart_4.1.16                 
##   [9] AnnotationFilter_1.20.0       KEGGREST_1.36.0              
##  [11] RCurl_1.98-1.6                doParallel_1.0.17            
##  [13] generics_0.1.2                GenomicFeatures_1.48.0       
##  [15] preprocessCore_1.58.0         ScaledMatrix_1.4.0           
##  [17] RSQLite_2.2.13                bit_4.0.4                    
##  [19] xml2_1.3.3                    httpuv_1.6.5                 
##  [21] assertthat_0.2.1              viridis_0.6.2                
##  [23] xfun_0.30                     hms_1.1.1                    
##  [25] jquerylib_0.1.4               evaluate_0.15                
##  [27] promises_1.2.0.1              fansi_1.0.3                  
##  [29] restfulr_0.0.13               progress_1.2.2               
##  [31] caTools_1.18.2                dbplyr_2.1.1                 
##  [33] DBI_1.1.2                     geneplotter_1.74.0           
##  [35] htmlwidgets_1.5.4             purrr_0.3.4                  
##  [37] ellipsis_0.3.2                RSpectra_0.16-1              
##  [39] dplyr_1.0.9                   backports_1.4.1              
##  [41] bookdown_0.26                 annotate_1.74.0              
##  [43] biomaRt_2.52.0                sparseMatrixStats_1.8.0      
##  [45] vctrs_0.4.1                   ensembldb_2.20.1             
##  [47] cachem_1.0.6                  withr_2.5.0                  
##  [49] checkmate_2.1.0               GenomicAlignments_1.32.0     
##  [51] prettyunits_1.1.1             cluster_2.1.3                
##  [53] ExperimentHub_2.4.0           grImport_0.9-5               
##  [55] lazyeval_0.2.2                crayon_1.5.1                 
##  [57] genefilter_1.78.0             edgeR_3.38.0                 
##  [59] pkgconfig_2.0.3               ProtGenerics_1.28.0          
##  [61] vipor_0.4.5                   nnet_7.3-17                  
##  [63] rlang_1.0.2                   lifecycle_1.0.1              
##  [65] filelock_1.0.2                BiocFileCache_2.4.0          
##  [67] rsvd_1.0.5                    AnnotationHub_3.4.0          
##  [69] av_0.7.0                      rsvg_2.3.1                   
##  [71] rngtools_1.5.2                Matrix_1.4-1                 
##  [73] Rhdf5lib_1.18.0               base64enc_0.1-3              
##  [75] beeswarm_0.4.0                png_0.1-7                    
##  [77] viridisLite_0.4.0             rjson_0.2.21                 
##  [79] bitops_1.0-7                  shinydashboard_0.7.2         
##  [81] KernSmooth_2.23-20            visNetwork_2.1.0             
##  [83] rhdf5filters_1.8.0            pROC_1.18.0                  
##  [85] Biostrings_2.64.0             blob_1.2.3                   
##  [87] DelayedMatrixStats_1.18.0     doRNG_1.8.2                  
##  [89] stringr_1.4.0                 jpeg_0.1-9                   
##  [91] gridGraphics_0.5-1            rols_2.24.0                  
##  [93] beachmat_2.12.0               scales_1.2.0                 
##  [95] memoise_2.0.1                 magrittr_2.0.3               
##  [97] plyr_1.8.7                    gplots_3.1.3                 
##  [99] distinct_1.8.0                zlibbioc_1.42.0              
## [101] compiler_4.2.0                dqrng_0.3.0                  
## [103] BiocIO_1.6.0                  RColorBrewer_1.1-3           
## [105] DESeq2_1.36.0                 Rsamtools_2.12.0             
## [107] cli_3.3.0                     XVector_0.36.0               
## [109] htmlTable_2.4.0               Formula_1.2-4                
## [111] MASS_7.3-57                   WGCNA_1.71                   
## [113] tidyselect_1.1.2              stringi_1.7.6                
## [115] yaml_2.3.5                    BiocSingular_1.12.0          
## [117] locfit_1.5-9.5                latticeExtra_0.6-29          
## [119] ggrepel_0.9.1                 grid_4.2.0                   
## [121] sass_0.4.1                    tools_4.2.0                  
## [123] parallel_4.2.0                rstudioapi_0.13              
## [125] bluster_1.6.0                 foreach_1.5.2                
## [127] foreign_0.8-82                metapod_1.4.0                
## [129] gridExtra_2.3                 Rtsne_0.16                   
## [131] digest_0.6.29                 BiocManager_1.30.17          
## [133] FNN_1.1.3                     shiny_1.7.2                  
## [135] Rcpp_1.0.8.3                  BiocVersion_3.15.2           
## [137] later_1.3.0                   httr_1.4.3                   
## [139] ggdendro_0.1.23               AnnotationDbi_1.58.0         
## [141] colorspace_2.0-3              XML_3.99-0.9                 
## [143] splines_4.2.0                 uwot_0.1.11                  
## [145] yulab.utils_0.0.4             statmod_1.4.36               
## [147] ggplotify_0.1.0               plotly_4.10.0                
## [149] xtable_1.8-4                  jsonlite_1.8.0               
## [151] dynamicTreeCut_1.63-1         UpSetR_1.4.0                 
## [153] flashClust_1.01-2             R6_2.5.1                     
## [155] Hmisc_4.7-0                   pillar_1.7.0                 
## [157] htmltools_0.5.2               mime_0.12                    
## [159] glue_1.6.2                    fastmap_1.1.0                
## [161] BiocNeighbors_1.14.0          interactiveDisplayBase_1.34.0
## [163] codetools_0.2-18              utf8_1.2.2                   
## [165] lattice_0.20-45               bslib_0.3.1                  
## [167] tibble_3.1.7                  curl_4.3.2                   
## [169] ggbeeswarm_0.6.0              gtools_3.9.2                 
## [171] magick_2.7.3                  GO.db_3.15.0                 
## [173] survival_3.3-1                limma_3.52.0                 
## [175] rmarkdown_2.14                munsell_0.5.0                
## [177] fastcluster_1.2.3             rhdf5_2.40.0                 
## [179] GenomeInfoDbData_1.2.8        iterators_1.0.14             
## [181] HDF5Array_1.24.0              impute_1.70.0                
## [183] reshape2_1.4.4                gtable_0.3.0

7 Funding

This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

8 References

Chang, Winston, and Barbara Borges Ribeiro. 2018. Shinydashboard: Create Dashboards with ’Shiny’. https://CRAN.R-project.org/package=shinydashboard.

Chang, Winston, Joe Cheng, JJ Allaire, Cars on Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2021. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Gentleman, R, V Carey, W Huber, and F Hahne. 2018. “Genefilter: Methods for Filtering Genes from High-Throughput Experiments.” http://bioconductor.uib.no/2.7/bioc/html/genefilter.html.

Lun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor.” F1000Res. 5: 2122. doi:10.12688/f1000research.9501.2.

Marques, Sueli, Amit Zeisel, Simone Codeluppi, David van Bruggen, Ana Mendanha Falcão, Lin Xiao, Huiliang Li, et al. 2016. “Oligodendrocyte Heterogeneity in the Mouse Juvenile and Adult Central Nervous System.” Science 352 (6291): 1326–9.

McCarthy, Davis J., Kieran R. Campbell, Aaron T. L. Lun, and Quin F. Willis. 2017. “Scater: Pre-Processing, Quality Control, Normalisation and Visualisation of Single-Cell RNA-Seq Data in R.” Bioinformatics 33 (8): 1179–86. doi:10.1093/bioinformatics/btw777.

Ortiz, Cantin, Jose Fernandez Navarro, Aleksandra Jurek, Antje Märtin, Joakim Lundeberg, and Konstantinos Meletis. 2020. “Molecular Atlas of the Adult Mouse Brain.” Science Advances 6 (26). American Association for the Advancement of Science: eabb3446.

Risso, Davide, and Michael Cole. 2022. ScRNAseq: Collection of Public Single-Cell RNA-Seq Datasets.

Vacher, Claire-Marie, Helene Lacaille, Jiaqi J O’Reilly, Jacquelyn Salzbank, Dana Bakalar, Sonia Sebaoui, Philippe Liere, et al. 2021. “Placental Endocrine Function Shapes Cerebellar Development and Social Behavior.” Nat. Neurosci. 24 (10). Nature Publishing Group: 1392–1401.

Zhang, Jianhai, Jordan Hayes, Le Zhang, Bing Yang, Wolf Frommer, Julia Bailey-Serres, and Thomas Girke. 2022. SpatialHeatmap: SpatialHeatmap. https://github.com/jianhaizhang/spatialHeatmap.